Analysis of Clustering and Sparse Coding for Feature Learning from Time-Series

نویسندگان

  • Masoumeh Heidari
  • Bonny Banerjee
چکیده

Two classes of relatively simple algorithms have been found to be very effective for unsupervised feature learning: 1) sparse coding that minimizes the reconstruction error, and 2) clustering that captures the data distribution. Coates et al. (2011) analyzed the performance of several off-the-shelf feature learning algorithms, such as, sparse auto-encoders, sparse RBMs, k-means clustering, and Gaussian mixture model, on the task of classification in images. The simplest and computationally efficient k-means clustering emerged as the best performer on the CIFAR-10 and NORB datasets. Sparse coding may be construed as a generalization of the winner-take-all spherical clustering. Our goal is two-fold: 1) to analyze how meaningful features can be learned from time-series by the application of clustering and sparse coding algorithms, and 2) to evaluate the performance of these two classes of algorithms for unsupervised feature learning from different benchmark time-series datasets of speech, physiological measurements (e.g. heart rate) and stock prices. The standard way to deal with time-series is to sample it using a shifting window; the data distribution within a window is assumed to be stationary. Keogh and Lin (2005) analyzed the challenges of applying clustering to such windows of time-series data. They found, if the overlap between consecutive windows is high, the features learned using clustering is independent of the data and hence, were deemed meaningless. Meaningful features can be learned using a much less overlap between consecutive windows or by using shift-invariant versions of clustering and sparse coding algorithms as in (Lewicki and Sejnowski, 1999). We evaluated the two classes of algorithms using a variety of statistical metrics, such as their ability to reconstruct the data as measured by fidelity rate or signal-tonoise ratio, ability to capture the data distribution as measured by the silhouette coefficient and KL-divergence, meaningfulness, structural similarity and significance of the learned features. The findings are as follows. The size of feature dictionary learned by clustering is smaller than sparse coding based on the average usage rate of the features for any given dataset. If enough features are learned using spherical clustering and sparse coding, a signal drawn from a small dataset can be reconstructed equally well using an encoding algorithm, such as orthogonal matching pursuit, on the two sets of features. However, as the dataset grows larger, as in TIMIT speech, sparse coding features tend to reconstruct better. As expected, clustering tends to capture the data distribution more closely than sparse coding. Data points represented by the same feature tend to be more similar while those represented by different features tend to be more dissimilar for clustering as compared to sparse coding. In conclusion, both classes of algorithms can learn meaningful and significant features from time-series data. The superior performance of sparse coding in some cases does not seem to be a consequence of learning a dictionary that represents the structure of the data more faithfully, but due to the sparse encoding scheme. The much simpler and computationally efficient spherical clustering can produce comparable, if not better, results in many cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Stock Price Prediction using Machine Learning and Swarm Intelligence

Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...

متن کامل

Face Recognition using an Affine Sparse Coding approach

Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hen...

متن کامل

Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains

In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016